Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification
نویسندگان
چکیده
Crowdsourcing is an arising collaborative approach applicable among many other applications to the area of language and speech processing. In fact, the use of crowdsourcing was already applied in the field of speech processing with promising results. However, only few studies investigated the use of crowdsourcing in computational paralinguistics. In this contribution, we propose a novel evaluator for crowdsourced-based ratings termed Weighted Trustability Evaluator (WTE) which is computed from the rater-dependent consistency over the test questions. We further investigate the reliability of crowdsourced annotations as compared to the ones obtained with traditional labelling procedures, such as constrained listening experiments in laboratories or in controlled environments. This comparison includes an in-depth analysis of obtainable classification performances. The experiments were conducted on the Speaker Likability Database (SLD) already used in the INTERSPEECH Challenge 2012, and the results lend further weight to the assumption that crowdsourcing can be applied as a reliable annotation source for computational paralinguistics given a sufficient number of raters and suited measurements of their reliability.
منابع مشابه
"Would You Buy a Car from Me?" - On the Likability of Telephone Voices
We researched how “likable” or “pleasant” a speaker appears based on a subset of the “Agender” database which was recently introduced at the 2010 Interspeech Paralinguistic Challenge. 32 participants rated the stimuli according to their likability on a seven point scale. An Anova showed that the samples rated are significantly different although the inter-rater agreement is not very high. Exper...
متن کاملLarge-Scale Speaker Ranking from Crowdsourced Pairwise Listener Ratings
Speech quality and likability is a multi-faceted phenomenon consisting of a combination of perceptory features that cannot easily be computed nor weighed automatically. Yet, it is often easy to decide which of two voices one likes better, even though it would be hard to describe why, or to name the underlying basic perceptory features. Although likability is inherently subjective and individual...
متن کاملPerceptual Ratings of Voice Likability Collected Through In-Lab Listening Tests vs. Mobile-Based Crowdsourcing
Human perceptions of speaker characteristics, needed to perform automatic predictions from speech features, have generally been collected by conducting demanding in-lab listening tests under controlled conditions. Concurrently, crowdsourcing has emerged as a valuable approach for running user studies through surveys or quantitative ratings. Micro-task crowdsourcing markets enable the completion...
متن کاملPair-Comparison for Collecting Voice Likability Ratings: Laboratory vs. Crowdsourcing
Crowdsourcing has established itself as a powerful tool being currently adopted in multiple domains as a means to collect human input for data acquisition and labelling. Experiments conventionally executed in a laboratory setup can now be addressed to a wider audience while controlling its diversity. However, it remains the question of whether the crowdsourcing outcomes are valid and reliable, ...
متن کاملIntelligent Systems’ Holistic Evolving Analysis of Real-Life Universal Speaker Characteristics
In this position paper we present the FP7 ERC starting grant project iHEARu (Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics). This project addresses several fundamental shortcomings in state of the art methods for computational paralinguistics, by introducing holistic analysis, evolving learning of features and models, and collection of real-life,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016